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Digital data signals are usually equalized by passing samples of 
the received signal through an adaptive equalizer consisting of a 
tapped delay line having adjustable coefficients (tap weights). The 
equalizer tap weights are adjusted by starting the transmission with 
a short training sequence of digital data known in advance by the 
receiver. This paper analyzes the situation when the known training 
sequence is replaced by a sequence of data symbols estimated from 
the equalizer output and treated as known data. Such procedures are 
called "decision-directed" startup. With a known training sequence, 
the "least-mean-square" adjustment algorithm corresponds mathe- 
matically to searching for the unique minimum of a quadratic "error" 
surface whose unimodal nature assures convergence. In decision- 
directed startup, by contrast, the use of estimated and unreliable 
data changes the error surface into a multimodal one so that complex 
behavior may result. We describe the nature of the error surfaces for 
binary and four- level transmission, thereby gaining insight into con- 
vergence problems. The most significant conclusion is that a poor 
choice for the initial tap settings may result in the taps converging to 
an undesirable setting. We show that, because of finite step-size 
effects, fluctuations are significant at the undesired settings and 
cause the spurious capture to have a long, but finite, duration. Finally 
we provide information on stability, convergence times, and lifetimes 
and their relation to the adaptation parameter (step size). 

I. INTRODUCTION 

In high-speed data transmission (4.8 or 9.6 kilobit/s) over voice- 
grade telephone channels, it is necessary to compensate for the Linear 
amplitude and phase distortion to which the data signal will be 
subjected. This compensation is usually accomplished by passing sam- 
ples of the received signal through an adaptive equalizer consisting of 
a tapped delay line having adjustable coefficients (tap weights). 
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Since the distortion is initially unknown, the tap weights must be 
suitably adjusted. Conventionally, the equalizer tap weights are 
adapted by starting the transmission with a short training sequence of 
digital data known in advance by the receiver. The receiver then uses 
the difference between the equalizer output signal and the known data 
to adjust the tap weights. 

In modern data-communication environments the above method 
may not always be practical, and thus new training procedures which 
do not make use of a known data-training sequence are required. 

A natural suggestion is to replace the known training sequence with 
a sequence of data symbols estimated from the equalizer output, and 
treat these as if they were known data. Such procedures are often 
called "decision-directed" startup. However, when these decision-di- 
rected startup procedures are used the estimated data may be unreli- 
able, so that it is not even certain that the tap weights will converge to 
their correct settings. 

For example, assume that there are N tap weights, Ci, c 2 , • • • , Cn, to 
be adjusted. The collection of these numbers is to be regarded as a 
vector c in an abstract iV-dimensional space. For the case of a known 
training sequence, the conventional tap-adjustment algorithm for find- 
ing the optimum tap settings (called the least-mean-square algorithm) 
corresponds mathematically to searching for the unique minimum of 
a certain quadratic "error" surface defined in this c space. The simple 
unimodel nature of this surface assures convergence. In decision-di- 
rected startup, by contrast, the use of estimated and unreliable data 
changes the error surface being searched into a multimodal one, so 
that quite complex behavior may result. The local minima are of two 
types. First there are the desired local minima, ones whose positions 
correspond to tap settings yielding the same performance as if known 
data were used. Second, there are the undesired, or extraneous, local 
minima which appear at positions corresponding to tap settings yield- 
ing inferior equalizer performance. 

We begin our work in Section III by describing the nature of the 
decision-directed error surfaces for binary (±1) and four-level (±1, ±3) 
transmission. In general, the surfaces in N dimensions are too complex 
for an exact description to be given. However, low-dimensional exam- 
ples give considerable insight into the problems encountered with 
convergence. Our most significant conclusion is that a poor choice for 
the initial tap settings may result in the taps converging to an unde- 
sirable setting and remaining there for a long time. In Section IV we 
show that although random fluctuations of the taps about the desired 
minima are small, this is not the case for random fluctuations about 
the extraneous minima. Rather, being trapped at one of the extraneous 
minima is an event having a long but finite lifetime. These lifetimes 
depend on the geometry of the error surface, and on the adaptation 
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parameter (step size) of the algorithm. Finally, we give quantitative 
information on stability, convergence times, and lifetimes, and their 
relation to the step size, although it is sometimes necessary to resort 
to approximations and idealized geometries to do so, even for the 
simplified mathematical model that we consider. 

Gitlin and Werner 1 have made an experimental study of decision- 
directed startup. They discovered that using the least-mean-square 
algorithm to update the tap weights works with estimated data in the 
binary case, but not in the four-level case. Other surprising phenomena 
have been observed. For example, E. Y. Ho 2 observed that occasionally, 
with four-level data, the signal constellation at the equalizer output 
would be perfect, except for one item. The signal points would be so 
reduced in amplitude that all data would be decoded as ±1, yielding 
an error rate of one-half. No ready explanation for these observations 
was at hand. Subsequent prodding from J. Salz lead us to conduct the 
present investigation, and, in the course of our general study, expla- 
nations of the above phenomena were found. 

II. MODEL AND REVIEW 

We begin this section with a description of the model that we use. 
As with many mathematical investigations, we have a choice as to 
what should be included in the formulation of the problem. Here, 
although we make several simplifications for mathematical tractability, 
our simplified model will provide an understanding of the unusual 
phenomena that have been observed in the experimental studies of 
decision-directed startup. 

The model is as follows: We consider baseband transmission of 
independent, equiprobable, binary or four-level data over a noiseless 
channel with no distortion. The receiver is an A/-tap synchronous 
equalizer whose initial tap setting is assumed arbitrary. We study the 
subsequent convergence of the tap vector to a final value when the 
mean-square-adjustment algorithms are modified in an obvious way 
to include the case of estimated data. The equations appropriate to 
the model are given at the start of Section III. 

As to notation, the vector of real tap coefficients is denoted by c — 
(ci, • ■ • , c N ).t The A:th data symbol is denoted by a k , while at time n, 
n = 0, 1, 2, • • • , the equalizer output y n is, for any fixed tap setting c, 

y n = c-a n , (1) 

where 

a„ = (a„, a n +i, ■ • • , a n +N-i) (2) 



t In the mathematics, all vectors are column vectors. In sentences, or in listing, the 
vector is, for typographic convenience, written as a row without using the usual 
superscript plus (+) to denote transposition. 
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is the vector of N channel samples that would be stored in the equalizer 
for this ideal situation. 

Ideally, we desire the output sequence (1) to be the sequence of data 
symbols. For this to occur, an ideal tap vector might be, for example, 
c = (1, 0, 0, ••■, 0) or, in fact, any vector c having exactly one 
component unity and the rest zero. For each such choice of tap vector 
the sequence of data values is reproduced, but with a different (and 
unimportant) time delay. For the present problem the set of desirable 
tap vectors must be enlarged to include the negatives of those just 
described, as well. The data must then be differentially encoded. 

We conclude this section with a review of the least-mean-square 
(lms) algorithm, and its analysis, for the case when the data are known 
by the equalizer. This review has two purposes. First it prepares the 
way for similar considerations in blind startup, and also it introduces 
some approximations that will be used throughout the paper. 

With known data, the optimum tap vector is defined to be the 
vector, Copt, which minimizes the mean-square error S 2 , where 

# 2 = E[ca n -ai] 2 , (3) 

E denoting expectation with respect to all data symbols, and c«a 
denoting the inner product between the vectors c and a. Of course, 
c-a = c + a. Regarding (3), note that (2) restricts I to satisfy n<l<n 
+ N — 1 so that a meaningful problem will result. For definiteness we 
choose / = n so that (3) becomes 

<f 2 = £[ca n -a„] 2 , (4) 

where, again, the expectation is with respect to all the data symbols 
{a„} , and c is a generic point in tap space. Regarded as a function of 
c, the right member of (4) describes the mean-square-error surface. 
We observe that the data symbol a n satisfies 

Ea n = 0, (5) 

a«^ = j lbinaly ' 

(5 four-level, (6) 

while the data vector a„ satisfies, using independence of the data 
symbols, 

Ea n &: = a 2 J. (7) 

In (7), 7 is the identity matrix for N dimensions. It is also customary 
to denote Ea„a n by ct 2 v, v = (1, 0, 0, • • • , 0). Then, from (4), the error 
surface S 2 may be written 

<P - ol[l + c + c - 2c + v]. (8) 
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This surface is convex and has a unique minimum at c = c op t ■ v. The 
value &% of S 2 at the minimum is 

Si = 0. (9) 

In applications, the error surface is unknown, and an iterative 
gradient search, called the lms algorithm, is used to find c opt . If at the 
nth iteration the tap vector had the value c„ , the known-data algorithm 
for our model is 

c„+i = c„ — ae„a„, (10) 

where 

e n = c n -a„ — a n (11) 

is the instantaneous output error. The step-size parameter a deter- 
mines the stability and speed of convergence. To see this explicitly, 
denote the error vector after the nth iteration, c„ — c op t, by e„. Then 
it can be shown that (10) and (11) give 

€„+i = (I- aa„a£)€„, (12) 

and so 

E || €n+ , || 2 = E € + n (I - aa„a:) 2 €„. (13) 

To evaluate the expectation in the right member of (13), it is standard 
practice to assume c„ and a„ are statistically independent. This as- 
sumption works surprisingly well in practice, and here and henceforth 
in our paper this so-called "independence assumption" is made. More 
innocently, because it may be checked by exact calculation, we ap- 
proximate f 

£(a„a^)(a„a^) = £:[(a„.a„)a„a^] 

« E(a„.a„)£(a„a:) = NaU. (14) 

Thus (13) becomes, on taking the expectation, 

E || e n+ i || 2 - (1 - 2aol + Na 4 a a 2 )E || €„ || 2 . (15) 

From (15) we see that the algorithm converges if (1 -2aal +No 4 a a 2 ) 
< 1 or, in other words, if 

0<a<-jn>. < 16 > 

Not 



f We have exactly that £[(a„-a„)a n a:] = NaU + [Ea* n - oJ]J. Now [Ea\ - oV\ = 
for the binary case and equals -4.5 for the four-level case. It may be neglected with 
respect to the first term even when N, the number of taps, is only moderately large. 
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Convergence is most rapid when (1 — 2aa 2 a + Naia 2 ) is smallest, i.e., 
when 

— *-jKS- (17) 

We close this section with some more notation. During transmission 
the data symbols are determined by "slicing" the equalizer output y,,. 
We name this nonlinear function sl( • ), for slicer. It is defined for binary 
transmission by 

s\(x) = sgn(x) (binary) 

and for four-level transmission by 

sH» = - sl(-*) = (3 !J J>2 <2 ' (4 " level) 



III. DECISION-DIRECTED SURFACES 

The standard modification of (10) and (11), which is appropriate to 
decision-directed startup, and which we analyze in this work, is simple 
to describe. Instead of (10) and (11) we have 

c„+i = c„ — ae„a n , (18) 

e„ = c„-a„ - d„, (19) 

where 

a„ = sl(c rt .a„). (20) 

Thus we have replaced the known-data symbol a n in (11) by its 
estimate (20) .f 

The task undertaken in this section is to describe the error surface 
that goes along with (18)-(20). That is, we want to give the equivalents 
of (4) and (8) which apply for known data. Since e n serves as an 
estimated gradient in (18), the surface, which we call J^, is, in principle, 
described by 

& 2 = £[ca„ - d n f = £[ca„ - sl(c.a„)] 2 , (21) 

c being a generic point in tap space. Averaging over the data vector a„ 
imposes the major difficulty. By stationarity, the average in (21) 



f To one unfamiliar with the actual data-transmission algorithm, it no doubt seems 
absurd to regard a„ as known, as in (18), but yet a„, its first component, is not known. 
Actually in the real problem, a„ in (18)-(20) is replaced by a vector which is measured. 
For the ideal channel the measured value would, in fact, be a„ , but the point is that the 
machine which implements the algorithm is built to handle the general case and would 
not know of, nor could it make use of, this fact. A similar remark could have been made 
in the treatment of (10) and (11). 

1862 THE BELL SYSTEM TECHNICAL JOURNAL, DECEMBER 1980 



doesn't depend on n and in such situations we often drop the subscript, 
writing simply a for a„. The components of a are then (a u ■ • • , on). 
Having mentioned this, we trust that no confusion will arise with the 
convention established in (2). 
We begin with the binary case. Applying (6) and (7) to (21) gives 

# 2 = l + c-c-2£|c.a|, (22) 

where the last term in (22) must still be averaged over the 2^ binary 
vectors a U) , i = 1, 2, • • • , 2 N . Now note that, for a fixed i, the hyperplane 
c-a <0 = divides iV-space into two regions, depending on the sign of 
c«a (,) ; in one region, c-a (,) > 0, while c-a * < in the other. Then the 
entire collection of such hyperplanes divides TV-space into a number of 
cone-shaped regions with the property that, in each cone, c-a (,) has a 
fixed sign (which depends on i). Suppose then, that c is in one of these 
cones, called 3%, and let ,y denote the set of indices {i} for which 
c-a*'* > 0, so |c-a U) | = c-a' , i E Sf. Denote the complement set of 
indices by 9"*. With this notation (22) becomes, for cGl, 



iF 2 = l + cc-^ 



I ca< 



- £ c-a 



U) 



1 + CC-2C p 



I a w - £ a 



= 1 + c-c — 2c -Co, 



(23) 



where c is defined by (23) in the obvious way. Since the quadratic 
form in (23) is strictly positive definite, the function & 1 has a unique 
minimum in the region 0t, at c = Co, provided that the vector Co G ^2. 
If Co £ ^?, ^is convex but has no minimum interior to ^2. In the former 
case, we denote the value of J^ 2 at the minimum by^o, and we have 



#0=1- Co-Co >0. 



(24) 



The above discussion shows that J* 2 always has its quadratic part, 
c«c = c + /c, determined by the identity matrix, while the linear term, 
— 2c -Co, changes from region to region; we expect a different c for 
each region. But since | x \ is a continuous function, we see, from (22), 
that 3^ is also continuous. 

Counting the number of cone-shaped regions appears to be very 
difficult in general. The problem is equivalent to the following. Let 0, 
the origin, be at the center of an iV-cube, and consider all hyperplanes 
through which are perpendicular to some vertex vector. Into how 
many cones do these hyperplanes divide iV-space? For N = 2, 3, 4, 5, 
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there are 4, 14, 104, 1882 cones, respectively.! We obtain sufficient 
insight for our purposes by considering some low-dimensional examples 
of (23). 

For N = 2, the lines through the origin which are perpendicular to 
the vertex vectors divide the plane into four regions, as shown in Fig. 
1. Calculating c for region I, for example, gives, using (23), 



«•-; 



IW-i-tt-ti 



J I. (25) 



Its position along the c\ axis is indicated by a small circle (as are the 
Co vectors for the other regions). Since Co is actually in region I, we 
have a local minimum there with &1 = 0. Note that if a data vector a 
= (ai, az) is sent, then c-a = Co-a = ai, and perfect detection of the 
first symbol occurs. Note S 2 in (8) also has its unique minimum at this 
point. However, from (19) and (20), e„ will always be zero if c = Co = 
(—1, 0) as well. Thus & 1 has a local minimum there too, and also at the 
points Co = (0, ±1) (the second symbol is also a valid one to use for 
detection). Such symmetries will always occur, and we consider just a 
representative c , e.g., (1, 0, 0, • • • , 0), for general N. This vector is 
representative of one of 2N positions to which we would wish the 
algorithm to converge. For N = 2, no other minima occur. 

The case N = 3 is the first interesting one. To describe the cones, we 
have shown their intersection with the cube in Fig. 2. Two types of 
cones occur. There are four-sided ones which intersect the faces in 
squares, and three-sided ones having their axes along the vertex 
directions. Representative minima of SF 2 occur at Co = (1, 0, 0) and Co 
= (1/2) (1, 1, 1). Thus there are six minima of the first type (centers of 
faces), and eight of the second type (along vertex directions). At the 
former, J*o = 0; at the latter, J^o = 0.25. 

If we are in the region containing (1, 0, 0), we always make a correct 
decision (on the first symbol of the a vector, it turns out) and the 
probability of error P e = 0. If we are in the region containing (1, 1, 1), 
the data vectors a = (1, 1, 1), (1, 1, —1), (1, —1, 1), and their negatives 
always have their first symbol decoded correctly [i.e., ai = sgn(c-a)] 
whereas (1, —1, —1) and its negative give an incorrect value. Thus P e 
= 1/4 for the first symbol when the tap vector is in this region. For 
this situation it happens that P e = 1/4 for any other symbol too. 

Thus, for N = 3 we see that if we choose a bad initial state for the 
equalizer, namely an initial tap vector lying in a vertex cone, conver- 
gence via gradient search will be to the local minimum at (1/2) (1, 1, 1). 
For all practical purposes, it will, because of initial conditions, have 
converged during decision-directed startup to an undesired set of tap 



f A list of the number of regions for N up to ten is given in Ref. 3. 
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Fig. 1— Cones for N - 2. 

weights. We are assuming here that the lms algorithm behaves as if 
the true gradient of the surface were being used. This is essentially 
true if the step size a is small enough; more comments on this will be 
made later. 

Similar situations prevail in five dimensions, where local minima 
occur when the taps are proportional to the representative vectors 
(10000), (11100), (11111), (53311), and (22111). There are also two 
other classifications of cones which do not have local minima in their 
interiors. 

The situation which includes noise and distortion should be clear. 
Certain unknown optimum tap settings exist, one of which we would 
hope to converge to, during decision-directed startup. If we make an 
initial guess close to such a desired local minimum, we converge there. 
If not, we converge to an undesired setting, yielding a bad error rate. 
Later, when we consider fluctuations for a finite step size, we shall see 
that capture at a spurious minimum need not be permanent; capture 
at a desired local minimum will be. 

At this point we stop our investigation of the binary problem and 
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Fig. 2— Cones for N = 3. 

move briefly to the four-level case. For this new situation we have, in 
place of (22), the representation 



^ 2 = o5cc + E[sl 2 (c-a) - 2c a sl(c-a)], 



(26) 



where s\ 2 (x) means (sl(x)) 2 . In (26) the average is taken with respect to 
all 4^ equilikely vectors a, which have ±1, ±3 as components. Note 
that 8F* in (26) is a continuous function of c, because sl 2 (*) — 2x s\(x) 
is continuous. 

We again partition iV-space into regions, where now in each region 
sl(c • a (,) ) is constant for each fixed i, i = 1, 2, • • • , 4 N . The averaging 
indicated in (26) again leads to a quadratic-plus-linear structure within 
each region, although the regional map is now considerably more 
complex that in the binary case. Its most outstanding feature is that 
the regions are now not cone-shaped. A map of regions for the four- 
level, N = 2 problem is drawn in Fig. 3 where the additional complexity 
is readily apparent. The error-free regions about the optimum tap 
vectors are indicated by the small kitelike regions, cross hatched in 
the figure. A much more accurate guess would have to be made with 
four-level transmission to assure that one had error-free data in a 
decision-directed startup procedure. 

For the four-level case in N dimensions we still have local (and 
global) minima at the optimum tap values represented by (1, 0, 
0, • • • , 0), ^o = 0, and other local minima as well. Although we have 
made no attempt to describe all the other local minima, there is one 
class that we do mention. We find it by looking for a local minimum of 
(26) at c = Co = (g, 0, 0, • • • , 0), g > 0. In the neighborhood of such a 
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tap vector we have sl(c-a) = s\(gai). If < g < 2/3, then sl(#ai) = 
sgn(ai), and we have 

&* = al c-c + 1 - 2E c-a sgn a x = 1 + alc-c - 4ci. (27) 

It follows from (27) that there is a minimum at 

c = (%, 0, 0, ... , 0), (28) 

which 

&l = 0.2. (29) 

A graph of J* 2 , as we move out along the Ci axis, is independent of 




Fig. 3 — Regions for N = 2, four-level transmission. 
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the dimension N, and is shown in Fig. 4. In particular, the minimum at 
the value of c given by (28), and also the global optimum, are to be 
noted. 

The character of the equalizer output when the tap vector is trapped 
at (28) may be noted. Instead of observing the ±1, ±3 data values we 
would see ±2/5, ±6/5, all of which would be decoded as ±1. E. Y. Ho 2 
has, in fact, observed such contracted signal constellations during 
startup experiments. 

An approximate description of a large number of other minima is 
deferred to Appendix A. 



IV. FINITE STEP SIZE 

In Section III we described the error surface appropriate to decision- 
directed startup with the mean-square algorithm and showed that it 
had many minima. Further, we assumed for sufficiently small step size 
a that the local motion on this surface followed the gradient directions. 




Fig. 4— The surface J^ 2 along the C\ axis (four-level transmission). 
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This implies that we would reach equilibrium at a local minimum and 
remain there. For finite step size this is a useful picture, but it is only 
an approximate one. The most important correction that we must 
make to it is to realize that the extraneous minima of ^ that we have 
discovered are not truly stable. We will, if only we wait long enough, 
always reach one of the global minima with & 1 = 0. 

To illustrate this, assume that we are initially in a region possessing 
a local minimum at c = Co, and that we remain in that region for a long 
time. Then one may derive an equation for the behavior of the mean 
norm of the error vector e„ = c„ — Co. In fact, subtracting c from both 
sides of (18) gives 

e„+i = e n - aa„[a„-e„ + c «a n — sl(c n -a„)]. (30) 

By definition of our regions and the assumption that we do not leave 
the region, we have 

sl(c„-a„) = sl(c -a„), all n. (31) 

Letting 

Qo - Co • a„ - sl(co • a„ ), (32) 

and squaring both members of (30) and taking averages with respect 
to all data symbols, we have 

Ee 2 „ +i = E[e 2 „ - 2a(e„ a„) 2 - 2aQae n . a„] 

+ a 2 Ea;a„[(€„.a„) 2 + 2Q €„.a„ + Q 2 ,], (33) 

where, for notational simplicity, we have set€ 2 = || e„ || 2 . Then it may 
be shown, from (33), using approximations of the type described in 
Section II, that we have, approximately, 

Ee 2 n+i = (1 - 2aa 2 + N<x 2 o 4 a )E€ 2 n + Na 2 o 2 a &l (34) 

Thus, from (34), as n becomes large the average of the squared-error 
vector approaches 

Eel = - aN AT , gg. (35) 

2 - OtNOa 

To insure rapid initial convergence one normally chooses aNol = 1, 
and for this choice of a, (35) becomes Eel, = #o/o 2 . 

Stability requires aNoi < 2, as is readily apparent from (34). 

If we are in equilibrium about c op t, then 3P% = and, from (35), there 
are no fluctuations. However, consider the binary case with N = 3, Co 
- (1/2)(1, 1, 1), and a = 1/N. For this case.Ec^ = J^o = 0.25; thus a 
typical error vector might have length about -JEel = V0.25 = 0.5. But 
the distance from Co to (1, 0, 0) is only Vo.75 = 0.87. Certainly it is 
reasonable to expect that fluctuations would soon move c from the 

DECISION-DIRECTED EQUALIZER CONVERGENCE 1869 



region containing Co to the error-free region containing c op t = (1, 0, 0), 
with convergence to c op t resulting. 

As we note from (35), the mean-squared fluctuation decreases for 
small a. Thus, for a small, we expect to wait a very long time for 
deviations of the required magnitude to occur, and our earlier assump- 
tion of being trapped at an undesired minimum is, in this sense, 
justified. 

Examining the detailed mechanism causing ultimate convergence to 
a Copt for the above example is worthwhile. For definiteness, consider 
convergence to (1, 0, 0). Table I illustrates the possible a vectors (only 
four of the eight need be listed) and the resulting decisions on the first 
symbol. 

In any infinitely-long time sequence of independently chosen vectors 
a, there will occur, if we wait, long runs where the vector (1, —1, —1) 
does not occur. Then we have no errors in the first symbol, and the 
tap vector moves, if the run is long enough, to a neighborhood of (1, 0, 
0), after which no errors occur, independently of what the succeeding 
a vectors are. In this manner we can imagine, in higher-dimensional 
problems, special sequences, low in errors for the &th symbol, causing 
the tap vector to move from region to another region, until the error- 
free region about the £th coordinate axis is entered. 

For small step size a diffusion approximation should describe the 
randomness quite well. However, the difficulty that we have in describ- 
ing (or even counting) the regions in N- dimensions prevents such an 
approach from giving precise information as to convergence times. 
Nevertheless some model problems are considered in Appendix B. 

Simulations show that, for the binary problem, some moderate delay 
is experienced with regard to convergence to c op t when starting as a 
random position with a = 1/N. The delay does become excessive for 
four-level transmission. This may be due to the smaller error-free 
region which must be reached. 

APPENDIX A 

Approximate Description of Some Minima 

The discussion in Section III emphasized the great plurality of 
regions and local minima associated with the surface represented by 

Table I — Decision table for c = - (1 , 1 , 1 ) 

a c -a d = sgn(c -a) 

(1, 1, 1) 3 1 (correct) 

(1, 1, -1) 1 1 (correct) 

(1, -1, 1) 1 1 (correct) 

(1, -1, -1) -1 -1 (error) 
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(21). A natural question is whether we can obtain an approximate but 
simpler representation of at least some of these minima. This appendix 
provides an affirmative answer to that question for large values of the 
dimension N. We begin with the surface (22) which applies to the case 
of binary transmission: 

& 2 = l + c-c-2E\c>a\. (36) 

The key is to note that if the vector c has many components approxi- 
mately equal, then c • a will be approximately Guassian with mean zero 
and variance Y,i <%• Since, for a zero-mean Guassian variable having 
variance a 2 we have E \ x \ = v2/tt a, (36) becomes 

(37) 

V it V i 

& 1 has a local minimum c is such that 

'S^= y^ = 0.798. (38) 

At the local minima we have 

&l = 0.363. (39) 

For four-level transmission the surface with which we must deal is 
described by (26). The presence of the function sl(c-a) only slightly 
complicates the calculations now; answers may readily be obtained 

numerically. We now have local minima whenever 




'Ecf-0.51, (40) 

and at the minima we have 

H - 0.340. (41) 

Thus if N is large and c is not too close to any axis, we expect many 
minima located at the indicated radii, and all of about the same depth. 
Hence for these minima we expect the motion from one to the other 
to be more like free diffusion rather then leakage from a well. The 
difference in diffusion times for these two ideal situations is discussed 
in greater detail in Appendix II. 

We close this appendix with a remark on the characteristic appear- 
ance of the equalizer output when its tap vector is trapped at a local 
minimum of the type just described (in contrast to the local minimum 
found at the end of Section III). The Gaussian assumption made 
concerning the distribution of c • a, which is, in fact, the output, implies 
that the output will have a unimodal distribution, peaked at the origin, 
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and of variance indicated above. Such had been observed by Gitlin 
and Werner. 1 

APPENDIX B 

Model Diffusion Problems 

In discussing finite step-size effects in Section IV we suggested that, 
for small step size, a diffusion approximation would be a useful model 
for the random dynamics inherent in the adaptive algorithms that we 
are considering. We saw, further, that decision-directed startup pro- 
cedures lead to a complicated region geometry for the error surface, 
and we mentioned that this precluded precise computation for the 
convergence rate of the optimum tap weights. However, an intuitive 
feeling for typical behavior certainly is worthwhile, and so we present 
in this appendix solutions to some simple but relevant model problems 
in diffusion. 

A typical diffusion problem for our work would involve, say, finding 
the average time for a particle, starting at a given initial position, to 
diffuse to an error-free region. In setting up such a problem for solution, 
the boundary of the error-free region would be replaced by an absorb- 
ing barrier and the mean-first-passage time to hit the barrier would be 
required. Therefore, in our model problems, we treat situations where 
the starting point is surrounded by an absorbing barrier of simple 
form. 

It is well known that one may approximate an isotropic random 
walk in N dimensions by a free diffusion. 4 If p (x, t\ xo, to) = p is the 
probability density for finding the particle at time t at position x, given 
that at time to it was as xo, then the density p obeys the diffusion 
equation 4 

| = Z)V 2 p, (42) 

ot 

V 2 being the TV-dimensional Laplacian operator. The diffusion constant 
D is given by 4 

Z) = ^£||Ax|| 2 p, (43) 

where Ax is a step in the random walk that we are approximating by 

the diffusion, and p is the number of steps per unit time. 
We have, of course, the initial condition for (42), 

]im t ^i p = 6(x - xo). (44) 

Furthermore, there are the boundary conditions: p = on an absorbing 
wall, while the normal derivative of p vanishes at a perfectly reflecting 
surface. 
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Our first task is to find an expression for the diffusion constant D in 
terms of the constants of the equalization problem. To this end, we 
rewrite (36) [restricting it to a given region and using (32)] as 

€„+! = €„ + A€„ - aa„[a„'€„], (45) 

with 

Ae„ = aa„Q . (46) 

Equation (45) is then of the form of a random walk with a restoring 
term -aa„(a n -€ n ). The quantity Ae„ alone, represents the steps that 
would be taken in a free-random walk, and thus Ac„ is to be identified 
with the step Ax in (43). Assuming for convenience an isotropic 
diffusion, we have, approximately, 

E || A€„ || 2 s a\E ■?*)(£ Qo) = Na 2 o 2 a <Fl (47) 

Thus if we identify the time t with n, so that p = one step/sec, we 
have, using (47) in (43), 

D = ^l^l (48) 

which is the expression for D that we seek. 

To generalize (42) to include the effect of the restoring term in (45), 
we note that the diffusion equation may also be regarded as the 
Fokker-Planck equation 4 corresponding to the continuous time version 
of the random walk. The dynamical equation governing the latter 
would simply be 

$=^n(f), (49) 

at 

n(t) being a Gaussian white-noise vector, of zero mean, independent 
components, each component of which is normalized as 

En(t)n(t') = 8(t - t'). (50) 

Including the restoring term of (45) yields the following continuous- 
time dynamical equation approximating the motion: 

% = -aale + J2D n(t), (51) 

at 

where we have used (7) to obtain the first term of the right member, f 



f The reader may wonder why no noise term appears in the dynamical equation 
analogous to the random dynamical term a n a^€ n of (30). The answer is simply that such 
a term is of higher order in a and we neglect it for simplicity. Theorems relevant to such 
small a diffusion approximations were first given by Kushner. 6 It was he who first 
suggested application of diffusion theory to stochastic approximation algorithms. 
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We simply state that the Fokker-Planck equation for the density p = 
p(x, ir, Xo, to) corresponding to this Markovian system is 

HE = V . [aa 2 axp] + DV 2 p. (52) 

dt 

The machinery just described is sufficient to solve some interesting 
problems. Obtaining the solutions for the simple problems that we 
consider is not difficult, and therefore only the results will be given. 
The detailed discussion up to this point was necessary for establishing 
the relationships between the constants appearing in our problem and 
those of diffusion theory. 

Our first model problem is: What is the mean-first-passage time t 
for a particle to freely diffuse (no restoring force) to a surrounding 
sphere of radius R, in N dimensions? 

The answer may be derived using the diffusion equation (42) and 
the average time turns out to be given by 

1 2DN flcfolN' 

This expression for the average first-passage time t implies that if, 
during decision-directed startup, we are in a region such as suggested 
in Appendix A and the step size a is, on the one hand, small enough 
for a diffusion approximation to hold, but yet is large enough so that 
small variations of &* in going from one local minimum to a neighbor- 
ing one are negligible, then we expect diffusion time out of the region 
to increase as 1/a 2 . Further, if a is held at a fixed percentage of the 
typical value a = (1/Nol), then f is proportional to N, the number of 
equalizer taps. 

We choose our second example to be one dimensional, for simplicity. 
A brownian particle starts at the minimum of a symmetric well, as 
shown in Fig. 5, and we assume that the points at ±R are absorbing. 
What is the average time ^before the particle is absorbed, i.e., leaves 
the well? 

If we take the equation of motion of the particle to be 

x = -kx + an(t), (53) 

where n(t) is white noise as in (50), then, making appropriate use of 
the Fokker-Planck equation (52), we find 



_ R 2 



Jexp(— Ov 2 ) dv I exp(0w 2 ) dw 
Jv 



-^/(0), (54) 
a 



where 
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-R R 

Fig. 5— Brownian particle in a well; x = —kx + an. 



R 2 

e = k^. (55) 



a 2 



As for the properties of f{6), we have f(0) = 1, f{6) > 1, and 

a 

1 + -, small, (56a) 

o 

f{6) = 1.44, 0=1, (56b) 

large. (56c) 



e 77" 
200' 



Using (48) and (51) to make contact with the equalization parameters, 
we have 

R 2 R 2 R 2 



a 2 2D &Wol 
and 



(57) 



R2 (58) 



a&l' 



This last equation, in conjunction with (56c), shows that, as a — > 0, the 
average trapping time for a particle in an isolated well grows exponen- 
tially with l/a. 

For the usual four-level decision-directed algorithm, we have already 
noted a local minimum at Co = (2/5, 0, 0, • • • , 0); see Section IV and 
Fig. 4. Further, we saw that neither J^o, nor the distance to the error 
free-region in the Ci direction (which distance we identify with the R 
of the above example), depended on the dimension N. However, by 
stability, a cannot exceed 2/Nol. Therefore, since a (R 2 /&l)olN, it 
follows from (54) and (56c) that t would be enormously long for large 
N. In practice, for a 32-tap equalizer, the observed shrinkage of the 
output signal constellation appears to persist indefinitely. Numerically, 
from Fig. 3, we use R = 0.667 - 0.400,^0 = 0.2. Setting a = 1/Nol , we 
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have that, for a 32-tap equalizer, R 2 /a 2 = NO = 320, 6 = 57. Using (54) 
and (56c), this corresponds to 10 25 iterations, on the average, before 
leaving the well. 
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